Differentiating information transfer and causal effect 
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The concepts of information transfer and causal effect have received much recent attention, yet 
often the two are not appropriately distinguished and certain measures have been suggested to be 
suitable for both. We discuss two existing measures, transfer entropy and information flow, which 
can be used separately to quantify information transfer and causal information flow respectively. We 
apply these measures to cellular automata on a local scale in space and time, in order to explicitly 
contrast them and emphasize the differences between information transfer and causality. We also 
describe the manner in which the measures are complementary, including the circumstances under 
which the transfer entropy is the best available choice to infer a causal effect. We show that causal 
information flow is a primary tool to describe the causal structure of a system, while information 
transfer can then be used to describe the emergent computation in the system. 
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I. INTRODUCTION 

Information transfer is currently a popular topic in 
complex systems science,with recent investigations span- 
ning cellular automata biological signaling networks 
0, [H, and agent-based systems In general, infor- 
mation transfer refers to a directional signal or com- 
munication of dynamic information from a source to a 
destination. However, the body of literature regarding 
quantification of information transfer appears to sub- 
sume two concepts: predictive or computational infor- 
mation transfer, and causal effect or information flow. 
That correlation is not causation is well-understood. Yet 
while authors increasingly consider the notions of in- 
formation transfer and information flow and how they 
fit with our understanding of correlation and causality 
@, B 0, 0, S 03, El G3, several questions nag. Is in- 
formation transfer akin to causal effect? If not, what 
is the distinction between them? When examining the 
"effect" of one variable on another (e.g. between brain 
regions), should one seek to measure information trans- 
fer or causal effect? Despite the interest in this area, it 
remains unclear how the notion of information transfer 
should sit with the concepts of predictive transfer and 
causal effect. 

Predictive transfer refers to the amount of information 
that a source variable adds to the next state of a desti- 
nation variable; i.e. "if I know the state of the source, 
how much does that help to predict the state of the des- 
tination?" . This transferred information can be thought 
of as adding to the prediction of an observer, or as be- 
ing transferred into the computation taking place at the 
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destination [13J; as such, we will also refer to this as the 
computational perspective. 

Causal effect refers to the extent to which the source 
variable has a direct influence or drive on the next state 
of a destination variable, i.e. "if I change the state of 
the source, to what extent does that alter the state of 
the destination?" . Information from causal effect can be 
seen to flow through the system, like injecting dye into 
a river [T(|. In an Aristotelian sense, we restrict our 
interpretation to efficient cause here (e.g. see [HI). 

Unfortunately, these concepts have become somewhat 
tangled in discussions of information transfer. Measures 
for both predictive transfer [§] and causal effect 

.10] have 

been inferred to capture information transfer in general, 
and measures of predictive transfer have been used to 
infer causality [ll|, 0j| [l(| [ijj with the two sometimes 
problematically) directly equated (e.g. [!, [H, H, [H, [H, 

The notion of information transfer remains cloudy 
while it is used interchangeably to refer to both con- 
cepts. Our thesis in this paper is that the concepts of 
predictive transfer and causal effect are quite distinct: 
we aim to clarify them and describe the manner in which 
they should be considered separately. We argue that the 
concept of predictive transfer (or the computational per- 
spective) is more closely aligned with the popularly un- 
derstood notion of information transfer, while causal in- 
formation flow should be considered separately as a use- 
ful notion in its own right. Using the perspective of in- 
formation theory (e.g. see [20j]), we contend that these 
concepts are properly quantified by the existing measures 
known as transfer entropy Q and information flow (lfj| 
respectively, and we use these measures to contrast the 
concepts. 

For this comparison, we examine Cellular Automata 
(CAs) (e.g. see [21[): discrete dynamical lattice systems 
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involving an array of cells which synchronously update 
their states as a homogeneous deterministic function of 
the states of their local neighbors. In particular we fo- 
cus on Elementary CAs (ECAs) , which consist of a one- 
dimensional array of cells with binary states, with each 
updated as a function of the previous states of them- 
selves and one neighbor either side (i.e. neighborhood 
size 3 or range r — 1). These previous neighborhood 
states and the recursive chain of their previous neighbor- 
hood states form the past light-cone of a cell (i.e. the 
set of all points capable of having a causal effect on it) 
[22l ]. CAs provide a well-known example of complex dy- 
namics, since certain rules (e.g. ECA rules 110 and 54 - 
see [2ll | regarding the numbering scheme) exhibit emer- 
gent structures which are not discernible from their mi- 
croscopic update functions but which provide the basis 
for understanding the macroscopic computations carried 
out in the CAs [231 ] . These structures include particles, 
which are coherent structures traveling against a back- 
ground domain region. Regular or periodic particles are 
known as gliders. Particles and gliders are important 
here because they are popularly understood to embody 
information transfer in the intrinsic computation in the 
CA [13. 

In particular, we examine the transfer entropy and in- 
formation flow measures on a local scale in space and 
time in ECAs, in order to provide an explicit compari- 
son between the two. This is the first presentation and 
examination of the local information flow. We demon- 
strate that transfer entropy as predictive transfer is more 
closely aligned with the notion of information transfer, 
since it alone is associated with emergent coherent infor- 
mation transfer structures, i.e. particles in cellular au- 
tomata. We also demonstrate that causality stands sep- 
arately as a useful concept itself, with information flow 
identifying causal relations in the domain region of the 
CA and demonstrating the bounds of influence without 
being confused by correlations. Additionally, we present 
parameter settings under which a variant of the transfer 
entropy may be used to provide an approximation of the 
information flow. 

On the basis of these results, we suggest that informa- 
tion flow should be used first wherever possible in order 
to establish the set of causal information contributors 
for a given destination variable. Subsequently, transfer 
entropy may be used to quantify the information trans- 
fer from these causal sources to the destination to study 
emergent computation in the system. 



II. PREDICTIVE INFORMATION TRANSFER 

A. Transfer entropy 

Schreiber presented transfer entropy as a measure for 
information transfer 9| in order to address deficiencies 
in the previous de facto measure, mutual information, 
the use of which was criticized in this context as a sym- 



metric measure of statically shared information. Transfer 
entropy is defined as the deviation from independence (in 
bits) of the state transition of an information destination 
X from the previous state of an information source Y [2!| : 

1 I (fe) \ 
T Y ^x =^ P (w n )\og 2 — -j^— , (1) 

P{X n+ l\X K n ') 

(k) 

where n is a time index, Xn refers to the k states of X 
up to and including x n , and w n is the state transition 
tuple (x n -\-i,Xn ,y n ). It can be viewed as a conditional 
mutual information, casting it as the average informa- 
tion in the source about the next state of the destina- 
tion that was not already contained in the destination's 
past k states. To ensure that no information in the des- 
tination's past is mistaken as transfer here, one should 
take the limit k — ► 00 though in practice finite-fc esti- 
mates must be used [l|. This conditioning on the past 
makes the transfer entropy a directional, dynamic mea- 
sure of information transfer, but it remains a measure of 
observed (conditional) correlation rather than direct ef- 
fect. In fact, the transfer entropy is a nonlinear extension 
of a concept known as the "Granger causality" [24[ , the 
nomenclature for which may have added to the confusion 
associating information transfer and causal effect. 



B. Local transfer entropy 

The transfer entropy is an average (or expectation 
value) of a local transfer entropy [l[ at each observation 
n, i.e. Ty^x = (tY^x(n + 1)) where: 

, { 1 1 \ 1 p(x n+1 \x ( n\y n ) 

ty^x(n + 1) = log 2 (k) ■ (2) 

p(x n+ l\Xn ') 

For lattice systems such as CAs with spatially-ordered 
agents, the local transfer entropy to agent Xi from 
at time n + 1 is represented as: 

,,. . 1 1 1 \ , P( x i,n+l\x in ,Xi-j } n) 

t(i,j,n+ l,k) = log 2 — rp- . (3) 



p(x, hn+ i\x^l) 



The transfer entropy t(i,j = 1, n + 1, fc Vto a gent Xi from 
Xi—i at time n+1 is illustrated in Fig. 1(a) t(i,j,n, k) is 
defined for every spatiotemporal destination (i, n), for ev- 
ery information channel or direction j ; sensible values for 
j correspond to causal information sources, i.e. for CAs, 
sources within the cell range \j\ < r. We write the aver- 
age for these lattice systems as T(j, k) — (t(i,j, n, k)). 

The transfer entropy may also be conditioned on other 
possible causal information sources, to eliminate their in- 
fluence from being attributed to the source in question Y 
Q. In general, this means conditioning on all sources Z 
in X's set of causal information contributors V (except 



for Y) with joint state 



giving the local complete 



3 



transfer entropy [l|: 



Information flow 



i I (fc) \ 
t^ x (n + l,k) = log 2 'f"'"'; mJ , (4) 

= Kivz g v,^^y,x}. (5) 

For C As this means conditioning on other sources „■ „ 
within the range r of the destination to obtain 



t c (i,j,n + l,k) = log 2 



.(fc) 



(6) 



«i,j,n = {^t+g.nN ■ -r < q < +r 7 q^ -j, 0} . (7) 

In deterministic systems (e.g. CAs), complete condition- 
ing renders t c (i,j,n) > because the source can only 
add information about the outcome of the destination. 
Calculations conditioned on no other information con- 
tributors (as in Eq. ([3])) are labeled as apparent transfer 
entropy. 

Finally, note that the information (or local entropy) 
h(i,n+ 1) required to predict the next state of a desti- 
nation can be decomposed as a sum of fl3j ]: 

• the information gained from the past of the des- 
tination (i.e. the mutual information between the 

past x[ k ^ and next state Xi^ n +i, known as the active 
information storage a{i, n + 1, fc)); plus 

• the information gained from each causal source con- 
sidered (in arbitrary order) in the context of that 
past, incrementally conditioning each contribution 
on the previously considered sources. 

For example, in ECAs we have: 

h(i, n + 1) = a(i, n + 1, k) + t(i, j = — 1, n + 1, k) + 
t c (i,j = l,n + l,k), (8) 

In this way, the different forms of the transfer entropy as 
information transfer can be seen to characterize impor- 
tant components of the total information at the destina- 
tion. 



III. CAUSAL EFFECT 

It is well-recognized that measurement of causal effect 
necessitates some type of perturbation or intervention of 
the source so as to detect the effect of the intervention on 
the destination (e.g. see [25|, |26|). Attempting to infer 
causality without doing so leaves one measuring correla- 
tions of observations, regardless of how directional they 
may be [ic| . Here, we adopt the measure information 
flow for this purpose, and describe how to apply it on a 
local scale. 



Following Pearl's probabilistic formulation of causal 
Bayesian networks [25[ , Ay and Polani [l(| consider how 
to measure causal information flow via interventional 
conditional probabilities. For instance, an interventional 
conditional probability p(a\s) considers the distribution 
of a resulting from imposing the value of s. Imposing 
means intervening in the system to set the value of the 
imposed variable, and is at the essence of the definition 
of causal information flow. 

In a similar fashion to the definition of transfer entropy 
as the deviation of a destination from stochastic indepen- 
dence on the source in the content of the destination's 
past, Ay and Polani propose the measure information 
flow as the deviation of the destination B from causal 
independence on the source A imposing another set of 
nodes S. Mathematically, this is written as: 



p(b\a,s) 



J2 a ,p(a>\s)p(b\a>, 



(9) 



with q representing the tuple (s, a, b) and the modified 
interventional distribution defined as: 



p(s,a,b) := p(s)p(a\s)p(b\a, 



(10) 



The value of the measure is dependent on the choice 
of the set of nodes S. To obtain the direct causal in- 
formation flow from A to B we must either include all 
possible other sources in S or at least include enough 
sources to block all non-immediate directed paths from 
A to B [lj3]. The minimum to satisfy this is the set of 
all direct causal sources of B excluding A, including any 
past states of B that are direct causal sources. For com- 
puting direct information flow across one cell to the right 
in ECAs (see Fig. 1(c) ) where a = Xi-x j7l and b = Xi >n +i, 



this means S includes the immediate past of the desti- 
nation cell and the previous state of the cell on its right 
(i.e. s iiTl = {x hn ,Xi + i^n}). Generalized as I p (j) for in- 
formation flow across j cells to the right in any ID CA, 
we have: 



(11) 



Establishing the value of I p (A — > B\S) requires de- 
termination of the underlying interventional conditional 
probabilities. By definition these may be gleaned by ob- 
serving the results of intervening in the system, however 
this is not possible in many cases. 

One alternative is to use detailed knowledge of the dy- 
namics, in particular the structure of the causal links and 
possibly the underlying rules of the causal interactions. 
This also is often not available in many cases, and indeed 
is often the very goal for which one turned to such analy- 
sis in the first place. Regardless, where such knowledge is 
available it may allow one to make direct inferences, e.g. 
under complete determination of the observed variable 



by the imposing set (e.g. p(b\a, s) in ECAs in Fig. 1(c) I, 
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(a)Apparent transfer entropy 



(b) Complete transfer entropy 



(c)Information flow 



FIG. 1: Measures of information transfer and causality measured across one cell to the right in EC As 
entropy: information contained in the source cell Xi-i about the next state of the destination cell X, 



(a) Apparent transfer 
at time time n + 1 that 

was not contained in the destination's past, (b) Complete transfer entropy: information contained in the source cell Xi-i about 



the next state of the destination cell Xi at time time n + 1 that was not contained in either the destination's past or the other 
information contributing cell Jfi+i . As per Section IIV CI transfer entropy should only be interpreted as information transfer 

Information flow: 



when measured from within the past light-cone of (c) Information flow: the contribution of a causal effect from source 

cell Xi-i to the next state of the destination cell Xi at time time n+1, imposing the previous states of the destination cell and 
the other information contributing cell Xi+i; here the source a = the destination b = Xi, n +i, the imposed contributors 

are s = {x i>n , x i+ i tn } and the cells blocking a back-door path relative to (s,a) are u = {xi-i t „-\, a; ijn _i, a; i+ i, n _i, x i+ 2, n -i}- 



or where the observed variable remains unaffected by the 
imposition (e.g. p(a\s) in ECAs in Fig. 1(c)) allowing 



one to use the observational probabilities alone indepen- 
dently of the imposed variable. 

Furthermore, certain cases exist where one can con- 
struct these value from observational probabilities only 
(To| . For example, the "back-door adjustment" (Section 
3.3.1 of 25])[30j suggests that where a set of nodes U 
satisfies the "back-door criteria" relative to (X,Y), i.e. 
that: 



1. "no node in U is a descendant of X, and 

2. "U blocks every path between" X and Y that con- 
tains a directed causal link into X; 

then the interventional conditional probability p{y\x) is 
given by: 



P(y\x) = ^2p(y\x,u)p(u). 



(12) 



The back-door adjustment could be applied to p(a\s) in 
ECAs in Fig. 1(c) with the set of nodes satisfying the 
back-door criteria marked there as u; for p(b\a, s) the set 
u-2 = {it, Xi-2,n-i} would be used. In general, note that 
the back-door adjustment can only be applied for the 
information flow in isolation (i.e. without knowledge of 
the underlying rules of causal interactions) where all rel- 
evant combinations are observed (i.e. for (y, x, it) where 
p(y, x,u) is strictly positive [10(). 



B. Local information flow 



We can define a local information flow: 



f(a -> b\s) = log 2 



p{b\a,s) 



(13) 



in a similar manner to the localization performed for the 
transfer entropy. The meaning of the local information 
flow is slightly different however. Certainly, it is an at- 
tribution of local causal effect of a on b were s imposed 
at the given observation (a, 6, s). However, one must be 
aware that I P (A — > B\S) is not the average of the local 
values f(a — > b\s). Unlike the transfer entropy, the infor- 
mation flow is averaged over the modified interventional 
distribution p(s, a, b): a product of interventional condi- 
tional probabilities (see Eq. (| 10[) ) which in general does 
not reduce down to the probability of the given observa- 
tion pis, a, b). For example, it is possible that not all of 
the tuples (a, b, s) will actually be observed, so averaging 
over observations would ignore the important contribu- 
tion that any unobserved tuples provide to the determi- 
nation of information flow. 

For lattice systems such as CAs, we use the notation 
f(i,j,n + 1) to denote the local information flow into 
agent Xi from the source agent Xi_j at time step n+1 
(i.e. flow across j cells to the right), giving: 



f(i,j,n+l) = log 



d(i,j,n + l) 



(14) 



d(i,j,n + l)= ^2 P( x 'i-j,n\si7s>-) 

x'. 

% — j , n 
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with Sjj.n defined in 1111 



B. Gliders distinguished as emergent information 
transfer 



IV. APPLICATION TO CELLULAR 
AUTOMATA 

Here, we apply the local transfer entropy and local in- 
formation flow to the raw states of ECA rule 54 in Fig. [5] 
This rule exhibits a (spatially and temporally) periodic 
background domain, with gliders traveling across the do- 
main and colliding with one another, forming the basis 
of an emergent intrinsic computation. 

Focusing on transfer and flow one step to the right per 
unit time step, we measure the average transfer values 
being T(j = 1, k = 16) = 0.080 and T c (j = 1, k = 16) = 
0.193 bits for apparent and complete transfer entropy 
respectively, and the information flow at I p {j — 1) = 
0.523 bits. Much more insight is provided by examining 
the local values of each measure however, and we examine 
four cases within these results to highlight the differences 
in the measures and indeed in the concepts of information 
transfer and causal effect. For measuring the information 
flow, p(b\a, s) is measured using observations only (unless 
otherwise stated) to minimize reliance on knowledge of 
the underlying dynamics. 



A. Background domains are highly causal 

As an extension of the example of coupled Markov 
chains in [1 0f | to more complex dynamics, we first look at 
the background domain region of the CA where each cell 
executes a periodic sequence of states. The four time step 
period of the (longest) sequences is longer than any one 
binary-state cell could produce alone - the cells rely on 
interaction with their neighbors to produce these long se- 
quences. We see that the local transfer entropies t(i, j = 
1, n, k = 16) and t c (i, j — 1, n, k = 16) measure vanishing 
information transfer here in Fig. 2(d) and Fig. 2(e) 



while the local information flow f(i,j = 1, ti) in Fig. 2(b) 



measures a periodic pattern of causal effect at similar lev- 
els to those in the glider/blinker regions. 

Both results are correct, but from different perspectives. 
From a computational perspective, the cells in the do- 
main region are executing information storage processes 
- their futures are (almost) completely predictable from 
their pasts Note that to achieve these long periods, 
some of this information is stored in neighbors and re- 
trieved after a few time steps [131 (a chieving a stigmergic 
information storage, similar to [27]). As such, there is 
vanishing information transfer here. On the other hand, 
much of the background domain is highly causal because 
had one imposed values on the sources there the desti- 
nations would have changed; hence we find the strong 
patterns of information flow here. We can also interpret 
this result by noting that the long periodic sequences in 
the background domain are underpinned by causal effect 
between the neighbors. 



We then examine the measurements at the gliders, the 
emergent structures which propagate against the back- 
ground domain. Here we see that the local transfer en- 
tropies t(i,j — l,7i, k = 16) and t c (i,j = 1,tj, fc = 16) 
measure strong information transfer in the direction of 
glider motion in Fig. 2(d) and Fig. 2(e) LU, while the lo- 
cal information flow f(i,j 



2(b) 



measures 



1, n) in Fig. 

similar levels of causal effect to those in the background 
domain. 

Again, both results are correct from different perspec- 
tives. The cell states in the glider region provide strong 
predictive information about the next states in the di- 
rection of glider motion: this is why gliders have long 
been said to transfer information about the dynamics in 
one part of the CA to another (as quantified by the local 
transfer entropy lj). For this reason, we say that pre- 
dictive transfer is the concept that more closely aligned 
with the popularly understood concept of information 
transfer. From a causal perspective, the same CA rules 
executed in the glider are also executed elsewhere in the 
domain of the CA - while imposing the source value does 
indeed have a causal effect on the destination in the glid- 
ers, the positive directional information flow here is no 
greater than levels observed in the domain. The measure 
certainly captures the causal flow in the gliders, but its 
localization does not distinguish that from the flow in the 
domain. 

It is possible that a macroscopic formulation of the in- 
formation flow might distinguish gliders as highly causal 
macroscopic structures, but certainly (when applied to 
the same source and destination pair as transfer entropy) 
as a directional measure of direct local causal effect it 
does not distinguish these emergent structures. In this 
form, the causal perspective focuses on the details or 
micro-level of the dynamics, whereas the predictive or 
computational perspective takes a macroscopic view of 
emergent structures. It is the examination in the con- 
text of the past k states that affords this macroscopic 
view to the transfer entropy. On the other hand, infor- 
mation flow intrinsically cannot consider the context of 



the past, since imposing on Xj 
the influence of those past k states. 



-0,n 



and 



blocks out 



Information transfer to be measured from 
causal sources only 



Fig. 2(f) measures the local apparent transfer entropy 



t(i,j = 2,ri, k — 16) for two steps to the right per unit 
time step. This profile is fairly similar to that produced 
for one step to the right per unit time step. However, 
this measurement is for superluminal transfer, i.e. trans- 
fer from outside of the past light-cone of Xi tTl . There 
should not be a real information transfer here - what we 
see in this profile merely reflects a correlation between 



(a)Raw CA 



(b)/(i,i = l,n) 



(c)t c (i, j = l,n,k = 1) 
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(d)t c (i, j = l,n,k = 16) 



{e)t(i,j = l,n,k = 16) 



(f)t(i, j = 2,ra,fc = 16) 



FIG. 2: Local transfer entropy and information flow for raw states of rule 54 in 



(a) (45 time steps displayed for 45 cells, time 



increases down the page): (b) Local information flow across one cell to the right, (all figures gray-scale with 16 le vels) with 
max. 1.07 bits (black); Local complete transfer entropy across one cell to the right: with past history length k = 1 in (c) (max. 
1.17 bits (black)), and past history length k — 16 in (d) (max. 9.22 bits (black)); Local apparent transfer entropy, positive 
values only: across one cell to the right in |(e)| (max. 7.93 bits (black)), and across two cells to the right in |(f)| (max. 6.00 bits 
(black)). 



the purported source and an actual causal source one 
cell away from the destination. This does not mean that 
the transfer entropy measure is wrong, merely that it has 
not been correctly applied here. It is only causal sources 
that are present in Eq. ((Sj) in contributing the correct 
information to predict the next state of the destination: 
in order to be genuinely interpreted as information trans- 
fer, the transfer entropy should only be applied to causal 
information sources for the given destination. 

To check the correctness of the information flow 
measure, we apply it here assuming the CA is of 
neighborhood-5 (i.e. two causal contributors on cither 
side of the destination). As expected, the local infor- 
mation flow profile computes no causal effect across two 
cells to the right per unit time step (not shown). Im- 
portantly however, note that the information flow could 
not be measured using observational data alone for either 
j = 1 or j = 2 in neighborhood-5 (since the CA does not 



produce all of the required (s, a) combinations for com- 
puting p(b\a, a)); specific knowledge about the dynamics 
was required for the calculation. 



Furthermore, measuring the complete transfer entropy 
t c (i,j = 2, n, k = 16) in this neighborhood results in a 
zero information transfer profile (not shown), since all the 
required information to predict the next state of the des- 
tination is contained within the interior neighborhood for 
this deterministic system. This aligns well with the zero 
result for information flow. Significantly, only the com- 
plete transfer entropy is able to make its inference using 
the available observational data alone, though both mea- 
sures require the correct neighborhood of other causal 
contributors to be a subset of those conditioned on or 
imposed here. 



7 



D. Complete transfer entropy as a next best 
inference for information flow 



The approximation that the complete transfer entropy 
provides to the information flow goes beyond similar in- 
ference of a lack of influence. Consider the profile of 



1, n, k = 1) in Fig. 2(c) - note how si milar i t is to 
the profile of the local information flow in Fig. |2(b)| This 
is because with the history length k = 1, the complete 
transfer entropy measures the information contributed 
by the source to the destination conditioning out only 
the information in other causal contributors. The equa- 
tion for the local complete transfer entropy with k = 1 
(Eq. ©) is indeed very similar to that for the local 
information flow (Eq. (TT3))). though they are measured 
over observational and interventional conditional prob- 
abilities respectively. Note also that the average value 
T c (j = 1, k = 1) = 0.521 bits is almost identical to the in- 
formation flow I p (j = 1) = 0.523 bits. Where one cannot 
intervene in the system, and does not have the required 
observations to use a method such as the back-door ad- 
justment, the local complete transfer entropy could pro- 
vide next best inference for the local information flow 
profile. In this case, the history length k should be set 
to include only the past states of the destination that 
are causal information contributors to its next state - no 
more, no less. (For example, in llj where the elements 
in Henon maps are causally effected by their previous two 
states, k — 2 would be appropriate rather than the use of 
k = 1 there). The history length parameter k therefore 
has an important role in moving the (complete) trans- 
fer entropy between measuring information transfer (at 
large k) and approximating causal effect (at minimal k). 

The complete transfer entropy is therefore a candidate 
method for inferring causal structure in a multi-variate 
time series in these appropriate conditions, so long as 
one understands it is neither a direct nor exact measure 
of causal effect. 

Importantly, the complete transfer entropy must con- 
dition on (at least) the correct neighborhood of causal 
sources in order to provide best approximation of the 
information flow. Since this is exactly what is being 
searched for in this circumstance, one would in fact need 
to build knowledge of the causal contributors for a given 
destination by incrementally conditioning on previously 
inferred sources (reminiscent of Eq. ©). This would be 
done by incrementally selecting the source which pro- 
vides the most statistically significant transfer entropy 
conditioned on the previously selected sources (this com- 
bines the multi-variate source selection of [3j with the 
complete transfer entropy and the statistical significance 
tests of [13 )• Testing this method is left for future work. 

Importantly also, while the complete transfer entropy 
can at least function in the absence of observations span- 
ning all possible combinations of the variables, if crucial 
combinations are not observed it can give quite incor- 
rect inferences here. For example, consider the classical 
causal example of a short circuit which causes a fire in 



the presence of certain conditions (e.g. with inflammable 
material), while the fire can also be started in other ways 
(e.g. overturning a lighted oil stove) 28]. If one never 
observes the short circuit in the right conditions, with- 
out the other fire triggers, the transfer entropy is in fact 
unable to infer a causal link from the short circuit to the 
fire. 



V. DISCUSSION AND CONCLUSION 

The concepts of information transfer and causal ef- 
fect have often been confused. In this paper, we have 
demonstrated the complementary nature of these con- 
cepts while emphasizing the distinctions between them. 
On an information-theoretical basis, information flow 
quantifies causal effect using an interventionist perspec- 
tive, while transfer entropy quantifies information trans- 
fer by measuring a (conditional) correlation on a causal 
channel. We have explored the subtle yet distinct differ- 
ences between these concepts using a local scale within 
cellular automata. 

Causal effect is a fundamental micro-level property of 
a system. Information flow should be used as a primary 
tool (where possible) to establish the presence of and 
quantify causal relationships. Where this is not possible 
(e.g. where one has no ability to intervene in the system, 
no knowledge of the underlying dynamics, and cannot ap- 
ply a method such as the back-door adjustment to obser- 
vational data), then the complete transfer entropy (with 
history length k set to a minimal value) is an alternate 
inference technique. The apparent transfer entropy is not 
applicable here since it cannot discern correlation from 
causal effect, and neither apparent nor complete trans- 
fer entropy with large k is suitable since these measure 
predictive information transfer rather than direct causal 
effect. Note that for both the information flow or com- 
plete transfer entropy, it is crucial that they be applied 
imposing or conditioning the correct set of other causal 
variables - the task of building knowledge of this correct 
set is left for investigation in future work. 

Information transfer can then be analyzed in order to 
gain insight into the emergent computation being car- 
ried out by the system. Importantly, the transfer entropy 
should only be measured for causal information contrib- 
utors to the destination, otherwise its result cannot be 
interpreted as information transfer. To do so, both the 
apparent and complete transfer entropy should be used, 
with history length k set as large as possible. These are 
complementary measures which allow one to assess the 
composition of information storage, transfer and interac- 
tions in a system [l3| . Information flow is not suitable 
for the analysis of emergent computation, since in repre- 
senting causal effect it takes too microscopic a viewpoint, 
and provides no method for describing the composition 
of information in the computation. 
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